{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "g9QvCK6Su4hY"
      },
      "source": [
        "##### Copyright 2018 The AdaNet Authors."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "cellView": "both",
        "colab": {
          "test": {
            "output": "ignore",
            "timeout": 900
          }
        },
        "colab_type": "code",
        "id": "EUman0Uju6SR"
      },
      "outputs": [],
      "source": [
        "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "l5CQ48UOxrqO"
      },
      "source": [
        "# The AdaNet objective\n",
        "\n",
        "One of key contributions from *AdaNet: Adaptive Structural Learning of Neural\n",
        "Networks* [[Cortes et al., ICML 2017](https://arxiv.org/abs/1607.01097)] is\n",
        "defining an algorithm that aims to directly minimize the DeepBoost\n",
        "generalization bound from *Deep Boosting*\n",
        "[[Cortes et al., ICML 2014](http://proceedings.mlr.press/v32/cortesb14.pdf)]\n",
        "when applied to neural networks. This algorithm, called **AdaNet**, adaptively\n",
        "grows a neural network as an ensemble of subnetworks that minimizes the AdaNet\n",
        "objective (a.k.a. AdaNet loss):\n",
        "\n",
        "$$F(w) = \\frac{1}{m} \\sum_{i=1}^{m} \\Phi \\left(\\sum_{j=1}^{N}w_jh_j(x_i), y_i \\right) + \\sum_{j=1}^{N} \\left(\\lambda r(h_j) + \\beta \\right) |w_j| $$\n",
        "\n",
        "where $w$ is the set of mixture weights, one per subnetwork $h$,\n",
        "$\\Phi$ is a surrogate loss function such as logistic loss or MSE, $r$ is a\n",
        "function for measuring a subnetwork's complexity, and $\\lambda$ and $\\beta$\n",
        "are hyperparameters.\n",
        "\n",
        "## Mixture weights\n",
        "\n",
        "So what are mixture weights? When forming an ensemble $f$ of subnetworks $h$,\n",
        "we need to somehow combine the their predictions. This is done by multiplying\n",
        "the outputs of subnetwork $h_i$ with mixture weight $w_i$, and summing the\n",
        "results:\n",
        "\n",
        "$$f(x) = \\sum_{j=1}^{N}w_jh_j(x)$$\n",
        "\n",
        "In practice, most commonly used set of mixture weight is **uniform average\n",
        "weighting**:\n",
        "\n",
        "$$f(x) = \\frac{1}{N}\\sum_{j=1}^{N}h_j(x)$$\n",
        "\n",
        "However, we can also solve a convex optimization problem to learn the mixture\n",
        "weights that minimize the loss function $\\Phi$:\n",
        "\n",
        "$$F(w) = \\frac{1}{m} \\sum_{i=1}^{m} \\Phi \\left(\\sum_{j=1}^{N}w_jh_j(x_i), y_i \\right)$$\n",
        "\n",
        "This is the first term in the AdaNet objective. The second term applies L1\n",
        "regularization to the mixture weights:\n",
        "\n",
        "$$\\sum_{j=1}^{N} \\left(\\lambda r(h_j) + \\beta \\right) |w_j|$$\n",
        "\n",
        "When $\\lambda \u003e 0$ this penalty serves to prevent the optimization from\n",
        "assigning too much weight to more complex subnetworks according to the\n",
        "complexity measure function $r$.\n",
        "\n",
        "## How AdaNet uses the objective\n",
        "\n",
        "This objective function serves two purposes:\n",
        "\n",
        "1.  To **learn to scale/transform the outputs of each subnetwork $h$** as part\n",
        "    of the ensemble.\n",
        "2.  To **select the best candidate subnetwork $h$** at each AdaNet iteration\n",
        "    to include in the ensemble.\n",
        "\n",
        "Effectively, when learning mixture weights $w$, AdaNet solves a convex\n",
        "combination of the outputs of the frozen subnetworks $h$. For $\\lambda \u003e0$,\n",
        "AdaNet penalizes more complex subnetworks with greater L1 regularization on\n",
        "their mixture weight, and will be less likely to select more complex subnetworks\n",
        "to add to the ensemble at each iteration.\n",
        "\n",
        "In this tutorial, in you will observe the benefits of using AdaNet to learn the\n",
        "ensemble's mixture weights and to perform candidate selection.\n",
        "\n"]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {
          "test": {
            "output": "ignore",
            "timeout": 900
          }
        },
        "colab_type": "code",
        "id": "t4ptB-vwWEGH"
      },
      "outputs": [],
      "source": [
        "from __future__ import absolute_import\n",
        "from __future__ import division\n",
        "from __future__ import print_function\n",
        "\n",
        "import functools\n",
        "\n",
        "import adanet\n",
        "import tensorflow as tf\n",
        "\n",
        "# The random seed to use.\n",
        "RANDOM_SEED = 42"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "ElcMSpyg_dYO"
      },
      "source": [
        "## Boston Housing dataset\n",
        "\n",
        "In this example, we will solve a regression task known as the [Boston Housing dataset](https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html) to predict the price of suburban houses in Boston, MA in the 1970s. There are 13 numerical features, the labels are in thousands of dollars, and there are only 506 examples.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "Mp5TTBOJ_ZTU"
      },
      "source": [
        "## Download the data\n",
        "Conveniently, the data is available via Keras:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {
          "test": {
            "output": "ignore",
            "timeout": 900
          }
        },
        "colab_type": "code",
        "id": "Plx4LtD4_XFY"
      },
      "outputs": [],
      "source": [
        "(x_train, y_train), (x_test, y_test) = (\n",
        "    tf.keras.datasets.boston_housing.load_data())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "bUSb0JtOA5Xc"
      },
      "source": [
        "## Supply the data in TensorFlow\n",
        "\n",
        "Our first task is to supply the data in TensorFlow. Using the\n",
        "tf.estimator.Estimator covention, we will define a function that returns an\n",
        "input_fn which returns feature and label Tensors.\n",
        "\n",
        "We will also use the tf.data.Dataset API to feed the data into our models.\n",
        "\n",
        "Also, as a preprocessing step, we will apply `tf.log1p` to log-scale the\n",
        "features and labels for improved numerical stability during training. To recover\n",
        "the model's predictions in the correct scale, you can apply `tf.expm1` to the\n",
        "prediction."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {
          "test": {
            "output": "ignore",
            "timeout": 900
          }
        },
        "colab_type": "code",
        "id": "H1U6DppVwTPM"
      },
      "outputs": [],
      "source": [
        "FEATURES_KEY = \"x\"\n",
        "\n",
        "\n",
        "def input_fn(partition, training, batch_size):\n",
        "  \"\"\"Generate an input function for the Estimator.\"\"\"\n",
        "\n",
        "  def _input_fn():\n",
        "\n",
        "    if partition == \"train\":\n",
        "      dataset = tf.data.Dataset.from_tensor_slices(({\n",
        "          FEATURES_KEY: tf.log1p(x_train)\n",
        "      }, tf.log1p(y_train)))\n",
        "    else:\n",
        "      dataset = tf.data.Dataset.from_tensor_slices(({\n",
        "          FEATURES_KEY: tf.log1p(x_test)\n",
        "      }, tf.log1p(y_test)))\n",
        "\n",
        "    # We call repeat after shuffling, rather than before, to prevent separate\n",
        "    # epochs from blending together.\n",
        "    if training:\n",
        "      dataset = dataset.shuffle(10 * batch_size, seed=RANDOM_SEED).repeat()\n",
        "\n",
        "    dataset = dataset.batch(batch_size)\n",
        "    iterator = dataset.make_one_shot_iterator()\n",
        "    features, labels = iterator.get_next()\n",
        "    return features, labels\n",
        "\n",
        "  return _input_fn"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "XXy9pqlq_PQx"
      },
      "source": [
        "## Define the subnetwork generator\n",
        "\n",
        "Let's define a subnetwork generator similar to the one in\n",
        "[[Cortes et al., ICML 2017](https://arxiv.org/abs/1607.01097)] and in\n",
        "`simple_dnn.py` which creates two candidate fully-connected neural networks at\n",
        "each iteration with the same width, but one an additional hidden layer. To make\n",
        "our generator *adaptive*, each subnetwork will have at least the same number\n",
        "of hidden layers as the most recently added subnetwork to the\n",
        "`previous_ensemble`.\n",
        "\n",
        "We define the complexity measure function $r$ to be $r(h) = \\sqrt{d(h)}$, where\n",
        "$d$ is the number of hidden layers in the neural network $h$, to approximate the\n",
        "Rademacher bounds from\n",
        "[[Golowich et. al, 2017](https://arxiv.org/abs/1712.06541)]. So subnetworks\n",
        "with more hidden layers, and therefore more capacity, will have more heavily\n",
        "regularized mixture weights."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "hZn9VeFUHvD-"
      },
      "outputs": [],
      "source": [
        "_NUM_LAYERS_KEY = \"num_layers\"\n",
        "\n",
        "\n",
        "class _SimpleDNNBuilder(adanet.subnetwork.Builder):\n",
        "  \"\"\"Builds a DNN subnetwork for AdaNet.\"\"\"\n",
        "\n",
        "  def __init__(self, optimizer, layer_size, num_layers, learn_mixture_weights,\n",
        "               seed):\n",
        "    \"\"\"Initializes a `_DNNBuilder`.\n",
        "\n",
        "    Args:\n",
        "      optimizer: An `Optimizer` instance for training both the subnetwork and\n",
        "        the mixture weights.\n",
        "      layer_size: The number of nodes to output at each hidden layer.\n",
        "      num_layers: The number of hidden layers.\n",
        "      learn_mixture_weights: Whether to solve a learning problem to find the\n",
        "        best mixture weights, or use their default value according to the\n",
        "        mixture weight type. When `False`, the subnetworks will return a no_op\n",
        "        for the mixture weight train op.\n",
        "      seed: A random seed.\n",
        "\n",
        "    Returns:\n",
        "      An instance of `_SimpleDNNBuilder`.\n",
        "    \"\"\"\n",
        "\n",
        "    self._optimizer = optimizer\n",
        "    self._layer_size = layer_size\n",
        "    self._num_layers = num_layers\n",
        "    self._learn_mixture_weights = learn_mixture_weights\n",
        "    self._seed = seed\n",
        "\n",
        "  def build_subnetwork(self,\n",
        "                       features,\n",
        "                       logits_dimension,\n",
        "                       training,\n",
        "                       iteration_step,\n",
        "                       summary,\n",
        "                       previous_ensemble=None):\n",
        "    \"\"\"See `adanet.subnetwork.Builder`.\"\"\"\n",
        "\n",
        "    input_layer = tf.to_float(features[FEATURES_KEY])\n",
        "    kernel_initializer = tf.glorot_uniform_initializer(seed=self._seed)\n",
        "    last_layer = input_layer\n",
        "    for _ in range(self._num_layers):\n",
        "      last_layer = tf.layers.dense(\n",
        "          last_layer,\n",
        "          units=self._layer_size,\n",
        "          activation=tf.nn.relu,\n",
        "          kernel_initializer=kernel_initializer)\n",
        "    logits = tf.layers.dense(\n",
        "        last_layer,\n",
        "        units=logits_dimension,\n",
        "        kernel_initializer=kernel_initializer)\n",
        "\n",
        "    persisted_tensors = {_NUM_LAYERS_KEY: tf.constant(self._num_layers)}\n",
        "    return adanet.Subnetwork(\n",
        "        last_layer=last_layer,\n",
        "        logits=logits,\n",
        "        complexity=self._measure_complexity(),\n",
        "        persisted_tensors=persisted_tensors)\n",
        "\n",
        "  def _measure_complexity(self):\n",
        "    \"\"\"Approximates Rademacher complexity as the square-root of the depth.\"\"\"\n",
        "    return tf.sqrt(tf.to_float(self._num_layers))\n",
        "\n",
        "  def build_subnetwork_train_op(self, subnetwork, loss, var_list, labels,\n",
        "                                iteration_step, summary, previous_ensemble):\n",
        "    \"\"\"See `adanet.subnetwork.Builder`.\"\"\"\n",
        "    return self._optimizer.minimize(loss=loss, var_list=var_list)\n",
        "\n",
        "  def build_mixture_weights_train_op(self, loss, var_list, logits, labels,\n",
        "                                     iteration_step, summary):\n",
        "    \"\"\"See `adanet.subnetwork.Builder`.\"\"\"\n",
        "\n",
        "    if not self._learn_mixture_weights:\n",
        "      return tf.no_op()\n",
        "    return self._optimizer.minimize(loss=loss, var_list=var_list)\n",
        "\n",
        "  @property\n",
        "  def name(self):\n",
        "    \"\"\"See `adanet.subnetwork.Builder`.\"\"\"\n",
        "\n",
        "    if self._num_layers == 0:\n",
        "      # A DNN with no hidden layers is a linear model.\n",
        "      return \"linear\"\n",
        "    return \"{}_layer_dnn\".format(self._num_layers)\n",
        "\n",
        "\n",
        "class SimpleDNNGenerator(adanet.subnetwork.Generator):\n",
        "  \"\"\"Generates a two DNN subnetworks at each iteration.\n",
        "\n",
        "  The first DNN has an identical shape to the most recently added subnetwork\n",
        "  in `previous_ensemble`. The second has the same shape plus one more dense\n",
        "  layer on top. This is similar to the adaptive network presented in Figure 2 of\n",
        "  [Cortes et al. ICML 2017](https://arxiv.org/abs/1607.01097), without the\n",
        "  connections to hidden layers of networks from previous iterations.\n",
        "  \"\"\"\n",
        "\n",
        "  def __init__(self,\n",
        "               optimizer,\n",
        "               layer_size=32,\n",
        "               learn_mixture_weights=False,\n",
        "               seed=None):\n",
        "    \"\"\"Initializes a DNN `Generator`.\n",
        "\n",
        "    Args:\n",
        "      optimizer: An `Optimizer` instance for training both the subnetwork and\n",
        "        the mixture weights.\n",
        "      layer_size: Number of nodes in each hidden layer of the subnetwork\n",
        "        candidates. Note that this parameter is ignored in a DNN with no hidden\n",
        "        layers.\n",
        "      learn_mixture_weights: Whether to solve a learning problem to find the\n",
        "        best mixture weights, or use their default value according to the\n",
        "        mixture weight type. When `False`, the subnetworks will return a no_op\n",
        "        for the mixture weight train op.\n",
        "      seed: A random seed.\n",
        "\n",
        "    Returns:\n",
        "      An instance of `Generator`.\n",
        "    \"\"\"\n",
        "\n",
        "    self._seed = seed\n",
        "    self._dnn_builder_fn = functools.partial(\n",
        "        _SimpleDNNBuilder,\n",
        "        optimizer=optimizer,\n",
        "        layer_size=layer_size,\n",
        "        learn_mixture_weights=learn_mixture_weights)\n",
        "\n",
        "  def generate_candidates(self, previous_ensemble, iteration_number,\n",
        "                          previous_ensemble_reports, all_reports):\n",
        "    \"\"\"See `adanet.subnetwork.Generator`.\"\"\"\n",
        "\n",
        "    num_layers = 0\n",
        "    seed = self._seed\n",
        "    if previous_ensemble:\n",
        "      num_layers = tf.contrib.util.constant_value(\n",
        "          previous_ensemble.weighted_subnetworks[\n",
        "              -1].subnetwork.persisted_tensors[_NUM_LAYERS_KEY])\n",
        "    if seed is not None:\n",
        "      seed += iteration_number\n",
        "    return [\n",
        "        self._dnn_builder_fn(num_layers=num_layers, seed=seed),\n",
        "        self._dnn_builder_fn(num_layers=num_layers + 1, seed=seed),\n",
        "    ]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "vJRH-v5NhnY9"
      },
      "source": [
        "## Train and evaluate\n",
        "\n",
        "Next we create an `adanet.Estimator` using the `SimpleDNNGenerator` we just defined.\n",
        "\n",
        "In this section we will show the effects of two hyperparamters: **learning mixture weights** and **complexity regularization**.\n",
        "\n",
        "On the righthand side you will be able to play with the hyperparameters of this model. Until you reach the end of this section, we ask that you not change them. \n",
        "\n",
        "At first we will not learn the mixture weights, using their default initial value. Here they will be scalars initialized to $1/N$ where $N$ is the number of subnetworks in the ensemble, effectively creating a **uniform average ensemble**."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "cellView": "both",
        "colab": {
          "height": 53,
          "test": {
            "output": "ignore",
            "timeout": 900
          }
        },
        "colab_type": "code",
        "executionInfo": {
          "elapsed": 227778,
          "status": "ok",
          "timestamp": 1539111275139,
          "user": {
            "displayName": "",
            "photoUrl": "",
            "userId": ""
          },
          "user_tz": 240
        },
        "id": "8TJHr1k-zYpN",
        "outputId": "0cd3cd50-31f4-47a3-c3cf-233a9687b721"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Loss: 0.0348092\n",
            "Architecture: | 1_layer_dnn | 2_layer_dnn | 3_layer_dnn | 4_layer_dnn | 5_layer_dnn |\n"
          ]
        }
      ],
      "source": [
        "#@title AdaNet parameters\n",
        "LEARNING_RATE = 0.001  #@param {type:\"number\"}\n",
        "TRAIN_STEPS = 100000  #@param {type:\"integer\"}\n",
        "BATCH_SIZE = 32  #@param {type:\"integer\"}\n",
        "\n",
        "LEARN_MIXTURE_WEIGHTS = False  #@param {type:\"boolean\"}\n",
        "ADANET_LAMBDA = 0  #@param {type:\"number\"}\n",
        "BOOSTING_ITERATIONS = 5  #@param {type:\"integer\"}\n",
        "\n",
        "\n",
        "def train_and_evaluate(learn_mixture_weights=LEARN_MIXTURE_WEIGHTS,\n",
        "                       adanet_lambda=ADANET_LAMBDA):\n",
        "  \"\"\"Trains an `adanet.Estimator` to predict housing prices.\"\"\"\n",
        "\n",
        "  estimator = adanet.Estimator(\n",
        "      # Since we are predicting housing prices, we'll use a regression\n",
        "      # head that optimizes for MSE.\n",
        "      head=tf.contrib.estimator.regression_head(\n",
        "          loss_reduction=tf.losses.Reduction.SUM_OVER_BATCH_SIZE),\n",
        "\n",
        "      # Define the generator, which defines our search space of subnetworks\n",
        "      # to train as candidates to add to the final AdaNet model.\n",
        "      subnetwork_generator=SimpleDNNGenerator(\n",
        "          optimizer=tf.train.RMSPropOptimizer(learning_rate=LEARNING_RATE),\n",
        "          learn_mixture_weights=learn_mixture_weights,\n",
        "          seed=RANDOM_SEED),\n",
        "\n",
        "      # Lambda is a the strength of complexity regularization. A larger\n",
        "      # value will penalize more complex subnetworks.\n",
        "      adanet_lambda=adanet_lambda,\n",
        "\n",
        "      # The number of train steps per iteration.\n",
        "      max_iteration_steps=TRAIN_STEPS // BOOSTING_ITERATIONS,\n",
        "\n",
        "      # The evaluator will evaluate the model on the full training set to\n",
        "      # compute the overall AdaNet loss (train loss + complexity\n",
        "      # regularization) to select the best candidate to include in the\n",
        "      # final AdaNet model.\n",
        "      evaluator=adanet.Evaluator(\n",
        "          input_fn=input_fn(\"train\", training=False, batch_size=BATCH_SIZE)),\n",
        "\n",
        "      # The report materializer will evaluate the subnetworks' metrics\n",
        "      # using the full training set to generate the reports that the generator\n",
        "      # can use in the next iteration to modify its search space.\n",
        "      report_materializer=adanet.ReportMaterializer(\n",
        "          input_fn=input_fn(\"train\", training=False, batch_size=BATCH_SIZE)),\n",
        "\n",
        "      # Configuration for Estimators.\n",
        "      config=tf.estimator.RunConfig(\n",
        "          save_checkpoints_steps=50000,\n",
        "          save_summary_steps=50000,\n",
        "          tf_random_seed=RANDOM_SEED))\n",
        "\n",
        "  # Train and evaluate using using the tf.estimator tooling.\n",
        "  train_spec = tf.estimator.TrainSpec(\n",
        "      input_fn=input_fn(\"train\", training=True, batch_size=BATCH_SIZE),\n",
        "      max_steps=TRAIN_STEPS)\n",
        "  eval_spec = tf.estimator.EvalSpec(\n",
        "      input_fn=input_fn(\"test\", training=False, batch_size=BATCH_SIZE),\n",
        "      steps=None)\n",
        "  return tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)\n",
        "\n",
        "\n",
        "def ensemble_architecture(result):\n",
        "  \"\"\"Extracts the ensemble architecture from evaluation results.\"\"\"\n",
        "\n",
        "  architecture = result[\"architecture/adanet/ensembles\"]\n",
        "  # The architecture is a serialized Summary proto for TensorBoard.\n",
        "  summary_proto = tf.summary.Summary.FromString(architecture)\n",
        "  return summary_proto.value[0].tensor.string_val[0]\n",
        "\n",
        "\n",
        "results, _ = train_and_evaluate()\n",
        "print(\"Loss:\", results[\"average_loss\"])\n",
        "print(\"Architecture:\", ensemble_architecture(results))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "A0vBGKeTj0Kf"
      },
      "source": [
        "These hyperparameters preduce a model that achieves **0.0348** MSE on the test\n",
        "set. Notice that the ensemble is composed of 5 subnetworks, each one a hidden\n",
        "layer deeper than the previous. The most complex subnetwork is made of 5 hidden\n",
        "layers.\n",
        "\n",
        "Since `SimpleDNNGenerator` produces subnetworks of varying complexity, and our\n",
        "model gives each one an equal weight, AdaNet selected the subnetwork that most\n",
        "lowered the ensemble's training loss at each iteration, likely the one with the\n",
        "most hidden layers, since it has the most capacity, and we aren't penalizing\n",
        "more complex subnetworks (yet).\n",
        "\n",
        "Next, instead of assigning equal weight to each subnetwork, let's learn the\n",
        "mixture weights as a convex optimization problem using SGD:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {
          "height": 71,
          "test": {
            "output": "ignore",
            "timeout": 900
          }
        },
        "colab_type": "code",
        "executionInfo": {
          "elapsed": 234614,
          "status": "ok",
          "timestamp": 1539111509785,
          "user": {
            "displayName": "",
            "photoUrl": "",
            "userId": ""
          },
          "user_tz": 240
        },
        "id": "RXfJ6cb_KO35",
        "outputId": "d8e8e404-ed03-4d41-ce46-fd82ea75eff1"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Loss: 0.0449085\n",
            "Uniform average loss: 0.0348092\n",
            "Architecture: | 1_layer_dnn | 2_layer_dnn | 3_layer_dnn | 4_layer_dnn | 5_layer_dnn |\n"
          ]
        }
      ],
      "source": [
        "#@test {\"skip\": true}\n",
        "results, _ = train_and_evaluate(learn_mixture_weights=True)\n",
        "print(\"Loss:\", results[\"average_loss\"])\n",
        "print(\"Uniform average loss:\", results[\"average_loss/adanet/uniform_average_ensemble\"])\n",
        "print(\"Architecture:\", ensemble_architecture(results))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "BTUdJ_FWl84O"
      },
      "source": [
        "Learning the mixture weights produces a model with **0.0449** MSE, a bit worse\n",
        "than the uniform average model, which the `adanet.Estimator` always compute as a\n",
        "baseline. The mixture weights were learned without regularization, so they\n",
        "likely overfit to the training set.\n",
        "\n",
        "Observe that AdaNet learned the same ensemble composition as the previous run.\n",
        "Without complexity regularization, AdaNet will favor more complex subnetworks,\n",
        "which may have worse generalization despite improving the empirical error.\n",
        "\n",
        "Finally, let's apply some **complexity regularization** by using $\\lambda \u003e 0$.\n",
        "Since this will penalize more complex subnetworks, AdaNet will select the\n",
        "candidate subnetwork that most improves the objective for its marginal\n",
        "complexity:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {
          "height": 71,
          "test": {
            "output": "ignore",
            "timeout": 900
          }
        },
        "colab_type": "code",
        "executionInfo": {
          "elapsed": 211897,
          "status": "ok",
          "timestamp": 1539111721710,
          "user": {
            "displayName": "",
            "photoUrl": "",
            "userId": ""
          },
          "user_tz": 240
        },
        "id": "3g_C-dT7KaTy",
        "outputId": "f3c3012a-7906-4222-b603-db3ce849c33c"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Loss: 0.0320289\n",
            "Uniform average loss: 0.0345772\n",
            "Architecture: | linear | 1_layer_dnn | 2_layer_dnn | 2_layer_dnn | 3_layer_dnn |\n"
          ]
        }
      ],
      "source": [
        "#@test {\"skip\": true}\n",
        "results, _ = train_and_evaluate(learn_mixture_weights=True, adanet_lambda=.015)\n",
        "print(\"Loss:\", results[\"average_loss\"])\n",
        "print(\"Uniform average loss:\", results[\"average_loss/adanet/uniform_average_ensemble\"])\n",
        "print(\"Architecture:\", ensemble_architecture(results))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "s5RuM582oxo_"
      },
      "source": [
        "Learning the mixture weights with $\\lambda \u003e 0$ produces a model with **0.0320**\n",
        "MSE. Notice that this is even better than the uniform average ensemble produced\n",
        "from the chosen subnetworks with **0.0345** MSE.\n",
        "\n",
        "Inspecting the ensemble architecture demonstrates the effects of complexity\n",
        "regularization on candidate selection. The selected subnetworks are relatively\n",
        "less complex: unlike in previous runs, the simplest subnetwork is a linear model\n",
        "and the deepest subnetwork has only 3 hidden layers.\n",
        "\n",
        "In general, learning to combine subnetwork ouputs with optimal hyperparameters\n",
        "should be at least as good assigning uniform average weights."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "ajRhOvHGQu6w"
      },
      "source": [
        "## Conclusion\n",
        "\n",
        "In this tutorial, you were able to explore training an AdaNet model's mixture\n",
        "weights with $\\lambda \\ge 0$. You were also able to compare against building an\n",
        "ensemble formed by always choosing the best candidate subnetwork at each\n",
        "iteration based on it's ability to improve the ensemble's loss on the training\n",
        "set, and averaging their results.\n",
        "\n",
        "Uniform average ensembles work unreasonably well in practice, yet learning the\n",
        "mixture weights with the correct values of $\\lambda$ and $\\beta$ should always\n",
        "produce a better model when candidates have varying complexity. However, this\n",
        "does require some additional hyperparameter tuning, so practically you can train\n",
        "an AdaNet with the default mixture weights and $\\lambda=0$ first, and once you\n",
        "have confirmed that the subnetworks are training correctly, you can tune the\n",
        "mixture weight hyperparameters.\n",
        "\n",
        "While this example explored a regression task, these observations apply to using\n",
        "AdaNet on other tasks like binary-classification and multi-class classification."
      ]
    }
  ],
  "metadata": {
    "colab": {
      "collapsed_sections": [],
            "name": "adanet_objective.ipynb",
      "provenance": [],
      "version": "0.3.2"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
